home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-01
/
cljul90.zip
/
LITERATE.CPP
< prev
next >
Wrap
Text File
|
1990-06-19
|
28KB
|
632 lines
Literate C++
Marco S. Hyman
uucp: ...!pacbell!dumbcat!marc
Next time you're feeling bored ask a group of programmers to define good
program documentation -- then duck before every volume of The Art of Computer
Programming is thrown your way. The extremes are easy to identify. There is
usually at least one in the group insisting that the only documentation ever
needed is the source, the whole source, and nothing but the source. At the
other end of the spectrum will be the programmer bent under weighty volumes of
requirements analysis documents, system design documents, HIPO charts, data
flow diagrams, data dictionaries, structure charts, and, of course, source
listings.
One of the reasons for such diverse opinions is that each programmer is
likely to have a different documentation goal. Some programmers want to
explain algorithms, others want to show data flow and state transitions. There
are also those that do the bare minimum required by the organization they work
for. This last group can't see any purpose in writing documentation, so
perhaps we should state one:
"The purpose of program documentation is to provide enough information for
another programmer to understand and maintain the program."
If this purpose seems a bit altruistic substitute ``another programmer''
with ``you, after not looking at the program for a year.'' The purpose is just
as strong and certainly hits closer to home.
But what about C++ class documentation? One of the advantages of C++ and
object-oriented programming is that it leads to code re-use. However, a
programmer is not likely to re-use code when its function is a mystery. Forcing
another programmer to look at your implementation to discover what your code
does is not polite. It leads to your code being tossed in favor of code the
other guy understands. Code is not re-usable until it's documented.
"The purpose of class documentation is to provide enough information for
another programmer to use the class and member functions of the class."
Two levels of documentation are needed; the first level for users of a
class and the second level for maintainers of the class. Class users need
something like the pages in your C library reference manual. Class maintainers
need to know the algorithms used and WHY the code is the way it is. Both the
code and the class documentation will convey WHAT the class does.
Literate Programming
Don Knuth conceived the idea of programs as works of literature and created
``Literate Programming'' (see sidebar) as a method of explaining to programmers
what the computer was to do. In Knuth's implementation a program is written in
WEB, a language consisting of both TeX text and Pascal text. The combined text
is processed by two programs, TANGLE and WEAVE, to produce both Pascal source
code and TeX formatted documentation.
There are many advantages of keeping the documentation and the code in the
same file. A programmer is more likely to update both when both are on the
screen at the same time, thus keeping the program and documentation in sync. It
also becomes impossible to loose the documentation (without also losing the
source). With the proper tools a pretty printed version of the source can be
included in the documentation. Most important, the programmer is encouraged to
think about both documentation and code. This usually has a side effect of
improving both.
To see if the literate programming paradigm can be stretched to include C++
I propose Literate C++ (lc++)*1. As shown in figure 1 an lc++ input file
(file.lc) will be used to create both a C++ header file (file.h) and a C++
source file (file.cc). The lc++ input file (file.lc) can also be processed to
create a library manual page (file.3) and a class documentation file
(file.doc).
An lc++ language has been designed and a header and source file extraction
program, named lcpp, has been prototyped in awk.*2 They are both described
below. My goal is to use the prototype program for a while, refining the lc++
language as it is used to generate the second generation extraction tool in
lc++. The next step will be to determine a good format for both types of
documentation and the creation of the documentation extraction tools.
The lc++ Language
The lc++ commands are listed in figure 2. Each command starts with an at sign
(@) and currently must be the first token on a line, although this is likely to
change.
Each lc++ input (.lc) file creates one header file and one source file. If
multiple headers are required, for example, then each must have its own lc++
file. One of the purposes of the awk prototype is to determine if this
limitation is reasonable.
The .lc file contains three sections. The first section consists of all
text and commands prior to the @specification command. Text in this section is
ignored by both the code and documentation extraction tools. Commands usually
found in this section are @title and @copyright.
The second section of the file starts with @specification. Code and text
in the specification section are used to create the header (.h) file and the
library man page (.3) file. The code in this sections defines classes and
declares member functions.
The final section starts with the @implementation command. This section is
used to define the member functions declared in the section sections. The text
in this section is free form and used to explain what is being done and why.
Code and text in this section create the source (.cc) file and the
documentation (.doc) file. @inline commands will cause code to be added to the
end of the header file. A description of each command follows.
@title: The @title command causes a title, perhaps including version
information, to be written to both the .h and the .cc output files. Along with
the title is a canned notice that explains that the output files should not be
modified directly, but that changes should be applied to the input (.lc) file
and lc++ run again to generate new output files.
@copyright: The @copyright statement is written to both output files.
Comment delimiters must be supplied. This is not done automatically as
different authors prefer different commenting styles. The copyright section is
also a good place to include a change log, such as that built by RCS or any
other version control system you may be using.
@code: The @code command enables output. The location of the output, .h
file or .cc file, depends upon the current mode (specification or
implementation). All lines following the @code line are written to the current
file. Output continues until a command that disables output is encountered.
An @code is not always required for output. The @copyright command above, for
example, enables output by default.
@text: The @text command disables .h or .cc output and signifies the
beginning of documentation. The text will be written to the library manual
page (.3) file when @text is seen in the @specification section. Text will be
written to the documentation (.doc) file when seen in the implementation
section. Alternating @text and @code commands are often seen as the author
goes back and forth between coding and documenting.
@specification: The @specification command selects specification mode. In
specification mode @code output is written to the .h file. Classes are defined
in this mode and class members are declared. @code output is written to the .h
file immediately. Other output, such as class definitions and member
declarations, are not written until the specification mode ends. Output is not
enabled by the @specification command. The mode is ended by end of the lc++
input (.lc) file or by an @implementation command.
@class: The @class command starts the definition of a new class. Classes
are always output in the order that the @class command is found in the lc++
input (.lc) file. No output is done until the specification section of the .lc
file is finished. If circular class definitions are required use a class x;
declaration in an @code block before the @class definition.
@base: The @base command declares the base classes that make up a class.
The syntax of the command is @base @<classname> <base class description>. The
@<classname> is optional. By default, an @base command adds a base class to
the last class defined with the @class command. Because it may be easier to
document related classed by bouncing between them it is possible to add a base
class to any previously defined class by using the @<classname> syntax.
Example:
@class Class1 // defines Class1
@class Class2 // defines Class2
@base virtual public Base2 // adds Base2 as a base class of Class2
@base @Class1 public Base1 // adds Base1 as a base class to Class1
@base private Base2p // adds Base2p as another base class
// to Class2 (the last defined class).
@public, @protected, and @private: These three command add members to the
current class. Like the @base command members can be added to a previous named
class by adding an @<classname> after the @public, @protected, or @private
command. The text on the command line after the command will be copied into
the class definition. Proper C++ syntax must be followed.
The text following the command should explain when to use the member and
what the member does. Of course, this pertains to member functions much more
so that data members. How member functions are implemented is *not*
appropriate subject matter here. This is still part of the specification. The
implementation could vary many ways and still meet the specification. This text
is *not* added to the header file.
@requires: The @requires command introduces text that describes caller
requirements. That is, if the requirements are not followed than the called
function is not required to work. Examples of @requires would be that only
positive numbers are passed to a square root function. This command always
pertains to the last @public, @protected, or @private command found in the lc++
input (.lc) file.
@effects: This command introduces a very brief description of what the
member function does, i.e. what is the effects of calling the member function.
The description is used to generate class documentation. This command always
pertains to the last @public, @protected, or @private command found in the lc++
input (.lc) file.
See Abstraction and Specification in Program Development by Liskov and
Guttag on specifying procedures by use of a requires and an effects clause.
They also use a modifying clause which could be added to lc++.
@implementation: The implementation command forces classes defined to be
written to the header (.h) file and switches @code output to be written to the
source (.cc) file. All output, except for @inline (see below) will now be
written to the source file. Text written after the @implementation command
should discuss implementation details; more of the *how* than the *why*.
@member: The @member command starts the definition of a member function.
All lines following the @member command will be copied to the source (.cc)
file. The command will be used when the documentation extraction programs are
written. In the source extraction program it acts as an @code.
@inline: The @inline command adds the lines following the command to the
header (.h) file. The member function should have been declared as inline in
the specification section of the file. This is not verified, however, and can
lead to problems. For this reason future versions of the language will not use
this keyword.
The AWK extraction program
Listing 1 is lcpp, the awk program used to process the lc++ input file. It uses
the features of new awk, as described in The AWK Programming Language by Aho,
Kernighan, and Weinberger. The program is fairly simple and should be easy to
understand.
Two arrays, class and className, are used to associate a class name with a
class number. The array class returns a class number when indexed by a name.
The array className returns a name when indexed by a number.
The only tricky bit of coding is in the use of awk's associative arrays to
force classes and member functions to be output in the same sequence they were
input. The use of the member array illustrates the use. Whenever a new class
is defined three entries are added to the member array for the class using the
class number classNum, member[classNum, "public"], member[classNum,
"protected"], and member[classNum, "private"]. The three entries are
initialized to 0. This entry is then used as an index into the array when a
member definition occurs. A public member definition would be added at
member[classNum, "public", member[classNum, "public"]]. Note that this entry
uses three indexes and the third is the current count. The count is
incremented after the entry is added. The functions doClass and doMembers use
these embedded counts to control printing.
A Short Example
Listing 2 contains a short example of literate C++. The code doesn't do
anything except to illustrate some of the features of the language. Note how
descriptive text can be placed anywhere in the file. When processed by awk and
lcpp two output files are created. With the input file named test.lc the
output files are named test.h (listing 3) and test.cc (listing 4). The command
line used to generate these files was
awk -f lcpp test.lc
but this may vary between operation systems and versions of awk.
The definition of Literate C++ is not complete. Non-member functions are
not handled and inline member functions must be declared as inline in too many
places. Also, little thought has been given to how documentation should be
typeset. Documentation requirements are sure to force changes to the
definition. With use, this prototype will help show what other changes need to
be made to the language.
Will literate C++ work? Think of all the programs you've had to learn over
the years. Now think of those that have been the easiest to understand.
Weren't the ones easiest to understand accompanied by articles in Computer
Language, or Dr. Dobbs, or Byte: code and text -- a literate programming style.
Marco S. Hyman is a principal engineer, designing and writing software for a
company in San Francisco. C++ and object-oriented programming are hobbies he
pursues at home. He can be reached via e-mail (UUCP) at
...!pacbell!dumbcat!marc.
Bibliography
Aho, A.V, B.W. Kernighan, and P.J. Weinberger, The AWK Programming
Language, Addison-Wesley, Reading, Mass. (1988).
Liskov, B., and J. Guttag, Abstraction and Specification in Program
Development, MIT Press, Cambridge, Mass. (1986).
*1 Note: By rights this should be called C++WEB or WEB++. I thought of
lc++ first and like the name so haven't changed it.
*2 Note: Lcpp is written in awk and requires new awk (nawk for old UNIX
hands.) I believe the DOS ports of awk are new awk compatible.
Sidebar: Literate Programming
Literate Programming is the name given by Donald Knuth to a programming
language and documentation system built around the idea that a program can be
considered a work of literature. It is Knuth's belief that a ``practitioner of
literate programming can be regarded as an essayist, whose main concern is with
exposition and excellence of style.'' These main concerns emphasize the goal of
a literate program: explaining to another programmer what the computer is to
do.
Knuth's literate programming is implemented in WEB, a language that
combines the features of two other languages, TeX and PASCAL. WEB programs are
descriptions of software systems. A WEB description is processed by two other
programs, TANGLE and WEAVE, to produce a PASCAL source file and a TeX input
file. When the TeX input file is processed by TeX the output is a ``pretty
printed'' version of the program with supporting documentation.
WEB files are composed of modules with each module consisting of three
parts: TeX explanatory material, definitions (WEB adds simple macros to
PASCAL), and PASCAL code.
Each module is more or less self-contained and should not be so long that
its structure is hidden in its length and complexity. Modules are often a few
lines long, they are rarely longer than a page.
Other versions of WEB or WEB-like languages are also in use. CWEB is
similar to WEB but the output is TeX and C. (This is not to be confused with
the WEB2C tool that converts original WEB to C code.) loom is a preprocessor
written by Janet Incerpi and Robert Sedgewick and used in preparation of
Sedgewick's book Algorithms (Addison-Wesley, Reading, Mass., 1983).
The Communications of the ACM has a sometimes column on literate
programming moderated by Christopher J. Van Wyk of AT&T Bell Laboratories. See
the July 1987, December 1987, December 1988, June 1989, and September 1989
issues. The latest column described the language SPIDER which is used to
generate WEBs for other languages.
For more information see also:
Bently, J., D. Knuth, and D. McIlroy, ``Programming Perls: A Literate
Program,'' Communications of the ACM, 29,6 (June 1986), 471-483
Knuth, D., ``Literate Programming,'' Computer Journal, 27,2 (1984), 97-111
Knuth, D., The WEB System of Structured Documentation, Stanford Computer
Science Report CS980 (September 1983).
Figure 1
..............
. lc++ input .
. (file.lc) .
..............
|
.....................................
| |
lcpp awk script some future program
| |
.............. ....................
| | | |
............ ............. ................... ..................
C++ . . C++ . . class . . class .
. Header . . Source . . use (man page) . . implementation .
. (file.h) . . (file.cc) . . (file.3) . . (file.doc) .
............ ............. ................... ..................
Figure 2
@title Assign a title to the output files.
@copyright Put copyright info in output files.
@code Flag the following lines as code to be written to an
output file
@text Flag the following lines as text that is not to be
written to an output file.
@specification Start defining a specification.
@class Define a new class
@base Specify a base class for a previous class definition.
@public Specify a public interface to a class
@protected Specify a protected interface to a class
@private Specify a private interface to a class
@requires Specify member function requirements
@effects Specify member function effects
@implementation Start defining an implementation
@inline Define an inline member function
@member Define a member function
Listing 1 (lcpp)
# @(#) lcpp 12feb90 (msh)
# function timestamp: outputs the file creation timestamp
# this function may not work on non-unix systems
function timestamp( file ) {
"date" | getline d
print "// @(#) " file " created " d > file
}
# function notice: outputs the title and do not revise
# notice for the passed file.
function notice( title, file ) {
print title > file
print "" > file
print "// This file generated from the input file " ARGV[1] > file
print "// DO NOT REVISE THIS FILE." > file
print "// To make revisions modify the original input file." > file
print "" > file
}
# function members: keep track of members by class and type
# Entries are kept in the order defined.
function members( type ) {
$1 = ""
if ( $2 ~ /@.*/ ) {
classNum = class[ substr($2,2) ]; $2 = ""
} else {
classNum = classCount
}
member[classNum,type,member[classNum,type]] = $0
++member[classNum,type]
}
# function error: print line number, error message,
# and increase error counter
function error( msg ) {
print "Line " NR ": " msg
errors++
}
# function doMember: output members of a given type for a given class
function doMembers( num, type ) {
if ( member[num,type] > 0 ) {
print type ":" > hOut
for (i = 0; i < member[num,type]; ++i) {
print " " member[num,type,i] > hOut
}
}
}
# function doClass: outputs a class specification from
# the internal class tables
function doClass( num ) {
# output the class header
print "" > hOut
printf "class %s", className[num] > hOut
# Add any base classes. Output the opening brace.
for ( i = 0; i < base[num]; ++i ) {
printf "%s", base[num,i] > hOut
}
print " {" > hOut
# output the various members
doMembers( num, "public" )
doMembers( num, "protected" )
doMembers( num, "private" )
# terminate the class.
print "};" > hOut
}
# verify the correct number of arguments and build
# the name of the output files
BEGIN {
if (ARGC != 2) {
print "usage: " ARGV[0] " -f lcpp file"
exit 1
}
count = index(ARGV[1],".")
if (count == 0) {
hOut = ARGV[1] ".h"
ccOut = ARGV[1] ".cc"
} else {
hOut = substr(ARGV[1],1,count) "h"
ccOut = substr(ARGV[1],1,count) "cc"
}
timestamp(hOut); timestamp(ccOut)
}
# @<anything>: turn off output whenever an @command is found
$1 ~ /^@.*/ { outEnabled = 0 }
# @title: The title is written to both output files as a comment.
# output remains off.
$1 == "@title" {
$1 = "// title: "; notice( $0, hOut ); notice( $0, ccOut ); next }
# @copyright: Output is turned on so the following copyright info
# is written to both output files.
$1 == "@copyright" { hOutEnabled = 1; ccOutEnabled = 1; outEnabled = 1; next }
# @specification: Marker for the start of a specification.
# direct output to the header file only, but keep output disabled
$1 == "@specification" { hOutEnabled = 1; ccOutEnabled = 0; next }
# @text: Disable output (actually done above, just eat the @text)
$1 == "@text" { next }
# @code: Enable output for the following lines.
$1 == "@code" { outEnabled = 1; next }
# @class: look for class definition. Verify the class name.
# Start storing info in an array entry for the class.
$1 == "@class" {
if ( NF != 2 ) {
error( "invalid class definition" )
} else {
if ( $2 in class ) {
error( "duplicate class name" )
} else {
++classCount; classNum = classCount
class[$2] = classNum; className[classNum] = $2
base[classNum] = 0
member[classNum,"public"] = 0
member[classNum,"protected"] = 0
member[classNum,"private"] = 0
}
}
next
}
# @base: define a base for the named class. If not class
# named use the last class defined. Add it to the base class
# array for the appropriate class.
$1 == "@base" {
if ( $2 ~ /^@.*/ ) {
classNum = class[ substr($2,2) ]; $2 = ""
} else {
classNum = classCount
}
if ( classNum ) {
$1 = base[classNum] == 0 ? ":" : ","
base[classNum,base[classNum]] = $0
++base[classNum]
} else {
error( "no class for base definition" )
}
next
}
# keep track of public entries by class.
$1 == "@public" { members( "public" ); next }
# keep track of protected entries by class.
$1 == "@protected" { members( "protected" ); next }
# keep track of private entries by class.
$1 == "@private" { members( "private" ); next }
# process @requires. Ignore for now.
$1 == "@requires" { next; }
# process @effects. Ignore for now.
$1 == "@effects" { next; }
# entering the implementation section of the input. Set code output to go
# to the cc file after dumping the classes. Output remains off.
$1 == "@implementation" {
for ( classNum = 1; classNum <= classCount; classNum++ ) {
doClass( classNum )
}
classCount = 0
hOutEnabled = 0
ccOutEnabled = 1
print "#include \"" hOut "\"" > ccOut
next
}
# member function definition. Enable output to the c file.
$1 == "@member" { hOutEnabled = 0; ccOutEnabled = 1; outEnabled = 1; next }
# inline member function. Enable output to the h file.
$1 == "@inline" { hOutEnabled = 1; ccOutEnabled = 0; outEnabled = 1; next }
# check if an invalid @command was given and flag the line number
$1 ~ /^@/ { error( "unknown command" ); next }
# if output is enabled for the header file write this line out
outEnabled == 1 && hOutEnabled == 1 { print $0 > hOut }
# if output is enabled for the cc file write this line out
outEnabled == 1 && ccOutEnabled == 1 { print $0 > ccOut }
END {
for ( classNum = 1; classNum <= classCount; classNum++ ) {
doClass( classNum )
}
close( hOut );
close( ccOut );
if ( errors ) {
print errors "error(s) found"
exit 1
} else {
print "generated " hOut " and " ccOut
}
}
Listing 2 (test.lc)
@title Example Program
This text does not go in either file.
@copyright
/*
* This class doesn't do anything.
*/
@text
Note: Copyright output is to both files until
the next @command
@specification
Code output is not enabled. If you wish something
to be written to the header file you must turn on
code generation by using an @code
@code
#include <stdio.h>
@text
stdio.h was included above as it is used by
one of the inline functions.
@class testClass
This is where testClass is described.
@private int dataMember;
This is where dataMember is described.
@public inline testClass();
@requires
The requirements, if any, of the testClass
constructor.
@effects
The effects of calling the testClass constructor
@text
General text about the constructor.
@public virtual ~testClass();
@implementation
Text describing implementation issues.
@code
// this will be part of the .cc file
@text
The next function is inline, so it will be added
to the header file. This assumes that the function
has been declared inline above.
@inline
testClass::testClass()
{
@text
Text can be added even in the middle of a function.
Just use @code to start outputting code again.
@code
printf( "testClass constructor\n" );
}
@text
The next function is a member function.
@member
testClass::~testClass()
{
// do something here
}
Listing 3 (test.h)
// @(#) test.h created Thu Mar 29 17:56:47 PST 1990
// title: Example Program
// This file generated from the input file test.lc
// DO NOT REVISE THIS FILE.
// To make revisions modify the original input file.
/*
* This class doesn't do anything.
*/
#include <stdio.h>
class testClass {
public:
inline testClass();
virtual ~testClass();
private:
int dataMember;
};
testClass::testClass()
{
printf( "testClass constructor\n" );
}
Listing 4 (test.cc)
// @(#) test.cc created Thu Mar 29 17:56:47 PST 1990
// title: Example Program
// This file generated from the input file test.lc
// DO NOT REVISE THIS FILE.
// To make revisions modify the original input file.
/*
* This class doesn't do anything.
*/
#include "test.h"
// this will be part of the .cc file
testClass::~testClass()
{
// do something here
}